A Look at Crime in the United States

Author
Affiliation

Sadie Dodds

George Washington University

Published

December 15, 2025

Research Questions

Final Project for DATS 2102

Agenda:

  1. What is the data that will be studied?
  2. How do population of states compare to crimes rates?
  3. How do reported crimes compare to arrests for those crimes?
  4. How do violent crime rates compare to property crime?
  5. What do we see about rates of crime overall?
  6. What do we see about rates of arrests overall?
  7. What were the decisions made to craft these visualizations?
  8. How was AI used in the production of this project?

1 The Data

Task: Let’s take a look at our dataset.

1.1 What is our dataset?

Looking at the US in 2014, this dataset is concerned with various types of crimes and their rates in each of the states. This dataset comes from Social Explorer where the crime data was provided by the Uniform Crime Reporting Program (UCR) and the population data was provided by the Census Bureau.

Here, I will be analyzing the crime data using mapping and looking particularly at rates per 100,000 population for each state. I will primarily be comparing and contrasting arrests of violent and property crimes vs reported violent and property crimes so some basic questions I began with are:

  • Where can we see an increase or decrease of violent and property crimes in the country?
  • Where are there more or less arrests than reports? Why might this be?
  • Do states with higher populations have higher rates of violent and property crime? Higher or lower reports? Higher or lower arrests?
  • Is there a significant difference between where violent or property crimes are committed?

1.2 Importation and Wrangling

Next, let’s get our dataset of housing information:

Code
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

sns.set(style="white")
plt.rcParams['figure.figsize'] = (7,4)

raw = pd.read_csv("crime.csv")
raw.head()
Geo_FIPS Geo_Name Geo_QName SE_T001_001 SE_T003_001 SE_T003_002 SE_T003_003 SE_T009_001 SE_T009_002 SE_T009_003 SE_NV001_001 SE_NV001_002 SE_NV001_003 SE_NV002_001 SE_NV002_002 SE_NV002_003
0 1 Alabama Alabama 4849377 3485.891074 414.486232 3071.404842 351.735903 46.109016 305.565024 321.360868 43.118941 278.262548 30.313172 3.051938 27.302476
1 2 Alaska Alaska 719773 3455.672830 647.009543 2808.663287 718.143081 222.431239 495.572910 613.248899 204.369989 408.739978 104.616317 17.922317 86.693999
2 4 Arizona Arizona 6731484 3504.724961 384.729430 3119.995531 779.129832 138.141902 640.987931 674.323819 125.470104 548.809148 104.820869 12.642086 92.193638
3 5 Arkansas Arkansas 2966369 3579.965945 452.876901 3127.089044 743.939813 151.970304 591.969509 650.997903 139.294875 511.601894 93.076755 12.641718 80.333903
4 6 California California 38802500 2831.258295 390.603698 2440.654597 621.421300 273.657625 347.763675 552.082984 253.280072 298.802912 69.333162 20.374976 48.958186

And a map of the United States:

Code
import geopandas as gpd
shapefile_path = "map_20m.geojson"

gdf = gpd.read_file(shapefile_path)

usa = gdf.to_crs("ESRI:102003")
ax = usa.plot()
ax.set_axis_off()

We’ve got to wrangle the data to make it easier to work with. I’ll merge the crime data with the geographic US map data and remove the columns that include redundant geographical information.

Code
polys = usa.copy()

raw2 = pd.merge(polys, raw, left_on='NAME', right_on='Geo_Name', how='left')
Code
cols_to_drop = ["STATE", "LSAD", "CENSUSAREA", "Geo_FIPS", "Geo_Name", "Geo_QName"]
dropped = raw2.drop(columns=cols_to_drop, errors="ignore")

crime = dropped.copy()

1.3 Dataset

1.3.2 Variables

  • NAME: Name of State
  • geometry: Shape of State
  • SE_T001_001: Total Population (2014 est.)
  • SE_T003_001: Total Violent and Property Crimes Rate
  • SE_T003_002: Total Violent and Property Crimes Rate: Violent Crimes Rate
  • SE_T003_003: Total Violent and Property Crimes Rate: Property Crimes Rate
  • SE_T009_001: Total Violent and Property Crime Arrests Rate
  • SE_T009_002: Total Violent and Property Crime Arrests Rate: Violent Crime Arrests Rate
  • SE_T009_003: Total Violent and Property Crime Arrests Rate: Property Crime Arrests Rate
  • SE_NV001_001: Total Violent and Property Crime Arrests (Rate per 100,000 Population) (Adults Only)
  • SE_NV001_002: Violent Crime Arrests (Rate per 100,000 Population) (Adults Only)
  • SE_NV001_003: Property Crime Arrests (Rate per 100,000 Population) (Adults Only)
  • SE_NV002_001: Total Violent and Property Crime Arrests (Rate per 100,000 Population) (Juveniles Only)
  • SE_NV002_002: Violent Crime Arrests (Rate per 100,000 Population) (Juveniles Only)
  • SE_NV002_003: Property Crime Arrests (Rate per 100,000 Population) (Juveniles Only)

1.3.3 Inspection

Code
crime.shape
(49, 16)
Code
crime.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 49 entries, 0 to 48
Data columns (total 16 columns):
 #   Column        Non-Null Count  Dtype   
---  ------        --------------  -----   
 0   GEO_ID        49 non-null     object  
 1   NAME          49 non-null     object  
 2   geometry      49 non-null     geometry
 3   SE_T001_001   49 non-null     int64   
 4   SE_T003_001   49 non-null     float64 
 5   SE_T003_002   49 non-null     float64 
 6   SE_T003_003   49 non-null     float64 
 7   SE_T009_001   47 non-null     float64 
 8   SE_T009_002   47 non-null     float64 
 9   SE_T009_003   47 non-null     float64 
 10  SE_NV001_001  47 non-null     float64 
 11  SE_NV001_002  47 non-null     float64 
 12  SE_NV001_003  47 non-null     float64 
 13  SE_NV002_001  47 non-null     float64 
 14  SE_NV002_002  47 non-null     float64 
 15  SE_NV002_003  47 non-null     float64 
dtypes: float64(12), geometry(1), int64(1), object(2)
memory usage: 6.2+ KB
Code
crime.describe()
SE_T001_001 SE_T003_001 SE_T003_002 SE_T003_003 SE_T009_001 SE_T009_002 SE_T009_003 SE_NV001_001 SE_NV001_002 SE_NV001_003 SE_NV002_001 SE_NV002_002 SE_NV002_003
count 4.900000e+01 49.000000 49.000000 49.000000 47.000000 47.000000 47.000000 47.000000 47.000000 47.000000 47.000000 47.000000 47.000000
mean 6.463281e+06 2877.455898 352.462890 2524.993007 626.893848 128.453316 498.426145 534.005712 114.518977 419.458236 92.872053 13.878152 78.941134
std 7.190292e+06 826.808119 177.707792 684.438545 186.864497 58.207787 151.650380 163.500777 52.241962 130.936949 39.541516 7.887039 36.438798
min 5.841530e+05 1619.472614 101.665917 1517.806698 36.424731 16.998208 19.426523 26.104390 10.472110 15.632280 10.320340 3.051938 3.794243
25% 1.881503e+06 2208.213350 238.414514 1979.974476 505.952769 86.437279 415.979736 425.470641 78.879427 334.577211 65.778614 8.296043 53.617750
50% 4.649676e+06 2766.664810 316.347760 2448.174494 621.421300 117.396655 503.294052 526.192152 102.368708 414.103839 95.334311 12.642086 81.387267
75% 7.061530e+06 3413.986845 414.486232 2983.709094 742.384315 155.612971 614.876527 637.772185 141.222210 520.986494 115.395630 16.239322 100.009881
max 3.880250e+07 6426.840170 1244.359858 5182.480312 1085.704147 275.686054 856.229171 948.788710 253.280072 756.294797 208.682005 36.874181 183.237216
Code
crime.nunique(dropna=True)
GEO_ID          49
NAME            49
geometry        49
SE_T001_001     49
SE_T003_001     49
SE_T003_002     49
SE_T003_003     49
SE_T009_001     47
SE_T009_002     47
SE_T009_003     47
SE_NV001_001    47
SE_NV001_002    47
SE_NV001_003    47
SE_NV002_001    47
SE_NV002_002    47
SE_NV002_003    47
dtype: int64

2 Population vs Crime Rate

Task: Compare overall total populations of each state to toal violent and property crimes rates (reported) using static cloropleths.

Code
import geopandas as gpd, pandas as pd

ax = crime.plot(column="SE_T001_001",
    cmap="OrRd",
    scheme="Quantiles",
    k=5,
    legend=True,
    legend_kwds= dict(title="Total Population",loc='upper left', bbox_to_anchor=(1.01, 1)),
    edgecolor="black",
    linewidth=0.2)
ax.set_axis_off()

Code
ax = crime.plot(column="SE_T003_001",
    cmap="OrRd",
    scheme="Quantiles",
    k=5,
    legend=True,
    legend_kwds= dict(title="Total Violent and Property Crimes Rate",loc='upper left', bbox_to_anchor=(1.01, 1)),
    edgecolor="black",
    linewidth=0.2)
ax.set_axis_off()

Here, we get an overview of where violent and property crimes are more common as well as how this compares to populations. Interestingly, we see that the American South and West appear to have overall more violent and property crimes.

3 Reports vs Arrests Rate

Task: After understanding where violent and property crimes are reported, look at how this compares to where they result in arrests using static chloropleths.

Code
ax = crime.plot(column="SE_T003_001",
    cmap="OrRd",
    scheme="Quantiles",
    k=5,
    legend=True,
    legend_kwds= dict(title="Total Violent and Property Crimes Rate",loc='upper left', bbox_to_anchor=(1.01, 1)),
    edgecolor="black",
    linewidth=0.2)
ax.set_axis_off()

Code
ax = crime.plot(column="SE_T009_001",
    cmap="OrRd",
    scheme="Quantiles",
    k=5,
    legend=True,
    legend_kwds= dict(title="Total Violent and Property Crime Arrests Rate",loc='upper left', bbox_to_anchor=(1.01, 1)),
    edgecolor="black",
    linewidth=0.2)
ax.set_axis_off()

There is clearly a significant drop in actual arrest rates from reported violent and property crimes. Also important to note here is the lack of data for the states of Florida and Illinois. In FL, this is particularly significant because there is a large amount of reported violent and property crimes here.

4 Violent vs Property Crime Rates

Task: Compare violent and property crime rates against each other to better understand this variable as a whole using static chloropleths.

Code
ax = crime.plot(column="SE_T003_002",
    cmap="OrRd",
    scheme="Quantiles",
    k=5,
    legend=True,
    legend_kwds= dict(title="Total Violent Crimes Rate",loc='upper left', bbox_to_anchor=(1.01, 1)),
    edgecolor="black",
    linewidth=0.2)
ax.set_axis_off()

Code
ax = crime.plot(column="SE_T003_003",
    cmap="OrRd",
    scheme="Quantiles",
    k=5,
    legend=True,
    legend_kwds= dict(title="Total Property Crimes Rate",loc='upper left', bbox_to_anchor=(1.01, 1)),
    edgecolor="black",
    linewidth=0.2)
ax.set_axis_off()

Although the distribution of crimes across the country appears similar to what we’ve been seeing with Total Violent and Property Crimes Rates (with most appearing in the South and West and the least appearing in New England), the states with the most violent crimes are not quite the same as the states with the most property crimes.

5 Crime Rate Breakdown

Task: With a basic idea of each of these comparisons, compare all the information for each state’s violent and property crime rates using an interactive chloropleth.

Code
import json
import numpy as np
import plotly.express as px
from plotly.colors import sequential

col = "SE_T003_001"

polys_ll = polys.to_crs(epsg=4326)
crime["GEO_ID"] = crime["GEO_ID"].astype(str)
polys_ll["GEO_ID"] = polys_ll["GEO_ID"].astype(str)
geojson = json.loads(polys_ll.to_json())

crime["bin5"] = pd.qcut(crime[col], q=5, duplicates="drop")
crime["bin5_label"] = crime["bin5"].apply(lambda x: f"{x.left:.2f}{x.right:.2f}")

fig = px.choropleth(
    crime,
    geojson=geojson,
    locations="GEO_ID",
    featureidkey="properties.GEO_ID",
    color="bin5_label",
    color_discrete_sequence=px.colors.sequential.OrRd[2:8],
    category_orders={
        "bin5_label": crime.sort_values(col)["bin5_label"].unique()
    },
    hover_data={
        "NAME": True,
        col: True,
        "SE_T001_001": True,
        "SE_T003_002": True,
        "SE_T003_003": True
    },
    labels={
        "bin5_label": "Total Crime Rate",
        "crime_q": "Quantile bin",
        col: "Total Crime Rate",
        "SE_T001_001": "Total Population",
        "SE_T003_002": "Violent Crime Rate",
        "SE_T003_003": "Property Crime Rate",
        "NAME": "State"
    }
)

fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(title_text="Total Violent and Property Crime Rate<br>(Hover for breakdown)")
fig.show()

The range of colors used in this style of map is less diverse from the previous maps so although we are seeing the same visual breakdown, it appears less distinct when a state has a high rate of crime.

6 Arrest Rate Breakdown

Task: Additional information about arrests is provided in the dataset, look at how this information impacts our understanding of this variable using an interactive chloropleth.

Code
col = "SE_T009_001"

polys_ll = polys.to_crs(epsg=4326)
crime["GEO_ID"] = crime["GEO_ID"].astype(str)
polys_ll["GEO_ID"] = polys_ll["GEO_ID"].astype(str)
geojson = json.loads(polys_ll.to_json())

crime["bin5"] = pd.qcut(crime[col], q=5, duplicates="drop")
crime["bin5_label"] = crime["bin5"].apply(lambda x: f"{x.left:.2f}{x.right:.2f}")

fig = px.choropleth(
    crime,
    geojson=geojson,
    locations="GEO_ID",
    featureidkey="properties.GEO_ID",
    color="bin5_label",
    color_discrete_sequence=px.colors.sequential.OrRd[2:8],
    category_orders={
        "bin5_label": crime.sort_values(col)["bin5_label"].unique()
    },
    hover_data={
        "NAME": True,
        col: True,
        "SE_T009_001": True,
        "SE_T001_001": True,
        "SE_NV001_001": True,
        "SE_NV002_001": True
    },
    labels={
        "bin5_label": "Total Arrests Rate",
        "crime_q": "Quantile bin",
        col: "Total Arrests Rate",
        "SE_T009_001": "Total Arrests Rate",
        "SE_T001_001": "Total Population",
        "SE_NV001_001": "Total Adult Arrests (per 100k)",
        "SE_NV002_001": "Total Juvenile Arrests (per 100k)",
        "NAME": "State"
    }
)

fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(title_text="Total Violent and Property Arrests Rate<br>(Hover for breakdown)")
fig.show()

The same issue with the colors used in the previous map appears here as well as the previous issue with a lack of data for Illinois and Florida.

7 Visualization Decisions

Task: Discuss projection/CRS choices made throughout the project.

One of the first and most important decisions that I made in this project was in the color scheme. Throughout the process, I changed the colors around a few times to see what worked best but the colors I decided on (an orange and red scheme which ranges from white to dark red) felt the most logical. In part, this had to do with the content being about crime and particularly violent crimes and a more upbeat color pallette of yellows or purples or even a more pale color pallette of light blues or pinks.

Also important in relation to the use of colors was my use of a sequential, not continuous, color scheme where I sorted the data into five bins for each map before or while plotting. This allows the variety and difference in the variables across the country to be more obvious.

The final decision that I made was to mostly use a map of the US which follows the ESRI:102003 shape, curving upwards instead of down. This version of the map looks more accurate and familiar however is not available in plotly where I had to convert the map into geojsn and epsg=4326 in order for the code to produce a map.

8 AI Acknowledgment

Task: Explain the use of AI in the making of this report.

The AI used in this project was ChatGPT. It was used for the production of the final two maps here when there was no obvious a way to make the maps use binning for the coloring of the states. I asked if it was possible to do this within the plotly code and it answered ‘no’ explaining that plotly uses continuous coloring and then helped me to construct the part of the code which bins the Total Crime Rate and Total Arrests Rate.